These exercises are about manipulate single-cell data with Seurat. Please download the counting matrix from BOX and loading them into a Seurat object. Or you may also used the rds file data/scSeq_CKO_sceSub.rds.

If you want to load all dataset

library(DropletUtils)
library(DropletTestFiles)
fname <- "~path to the 10X counting matrix"
sce <- read10xCounts(fname, col.names=TRUE)
saveRDS(sce,"path to rds file")

If you want to load a subset

library(DropletUtils)
library(DropletTestFiles)
sce <- readRDS("data/scSeq_CKO_sceSub.rds")

Exercise 2 - Data manipulation with Bioconductor packages

Identify empty droplets

  1. Please draw knee plot and identify inflection point and knee point.
## Warning in xy.coords(x, y, xlabel, ylabel, log): 1 y value <= 0 omitted from
## logarithmic plot

  1. Please identify non-empty droplet and compare to the results with hard cut-off
## DataFrame with 400000 rows and 5 columns
##                        Total   LogProb    PValue   Limited       FDR
##                    <integer> <numeric> <numeric> <logical> <numeric>
## CCACGGAAGCTCTCGG-1         0        NA        NA        NA        NA
## TGTGGTAAGAGTCGGT-1         2  -11.1237 0.7561244     FALSE   0.99174
## CCCTCCTTCGTCTGCT-1         2  -18.2572 0.0727927     FALSE   0.66022
## ACTGAGTGTGTCGCTG-1         0        NA        NA        NA        NA
## CGTAGCGTCTCTGCTG-1         0        NA        NA        NA        NA
## ...                      ...       ...       ...       ...       ...
## AGAGCTTGTCCGACGT-1         0        NA        NA        NA        NA
## GCTGCGACAATAGCAA-1         0        NA        NA        NA        NA
## CGAACATAGTGAAGTT-1         2  -12.3017  0.632437     FALSE   0.99174
## TAGCCGGCATGTCGAT-1         0        NA        NA        NA        NA
## CGAGAAGAGCAGATCG-1         0        NA        NA        NA        NA
##    Mode   FALSE    TRUE    NA's 
## logical  211696    2110  186194
##        Limited
## Sig      FALSE   TRUE
##   FALSE 209919   1777
##   TRUE       1   2109

data normalization and clustering

  1. Please mak data normalization and clustering

Evaluate ambient RNA contamination

  1. Please esitmate ambient RNA contamination, remove them, and test by using Hba-a1 gene
## ENSMUSG00000051951 ENSMUSG00000025902 ENSMUSG00000033845 ENSMUSG00000025903 
##       1.669575e-07       1.669575e-07       1.407972e-04       5.201547e-05 
## ENSMUSG00000104217 ENSMUSG00000033813 
##       1.669575e-07       6.783772e-05

  1. Please re-clustering after ambient RNA removal

Remove doublets

  1. Please estimate doublets and evaluate the doublets were enriched in any clusters or not.Then try to remove the doublet cells/cclusters.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0422  0.3840  0.7385  0.9630  1.3082  7.2457

## 
##   no  yes 
## 2004  106

Advanced QC plots

  1. Please estimate mitochondrial contents (is.mito), read counts (sum) and gene counts (detected) for each cell. Then, draw plots.
  1. Estimate variance explaination and try to figure out which factor is in the majority of variance.
## Warning: Removed 2785 rows containing non-finite values (stat_density).